Skip to content

Conversation

alexlarsson
Copy link
Contributor

This changes the splitstream format a bit, with the goal of allowing splitstreams to support ostree files as well (see #144), but it it imho a generally nice change.

The primary differences are:

  • The header is not compressed
  • All referenced fs-verity objects are stored in the header, including external chunks, mapped splitstreams and (a new feature) references that are not used in chunks.
  • The mapping table is separate from the reference table (and generally smaller), and indexes into it.
  • There is a magic value to detect the file format.
  • There is a magic content type to detect the type wrapped in the stream.
  • We store a tag for what ObjectID format is used
  • The total size of the stream is stored in the header.

The ability to reference file objects in the repo even if they are not part of the splitstream "content" will be useful for the ostree support to reference file content objects.

This change also allows more efficient GC enumeration, because we don't have to parse the entire splitstream to find the referenced objects.

@alexlarsson alexlarsson force-pushed the splitstream-new-format branch from 3803dc3 to 9aedd96 Compare September 29, 2025 09:26

use crate::{sha256_from_descriptor, sha256_from_digest, tar::split_async, ContentAndVerity};

pub const TAR_LAYER_CONTENT_TYPE: u64 = 0x2a037edfcae1ffea;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment for where these came from? I'm guessing random? If so just add a comment // Random unique ID ?

That said I wonder if it wouldn't be nicer to store (variable length) strings for this in the format? Maybe it could go all the way to literally suggested to be the mediaType from OCI (if applicable)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. These are random.

But, I'd rather avoid having variable length things in the header. That makes parsing it much more tricky.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could make it a real uuid tho

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine keeping as u64 too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some comments here

pub fn has_named_stream(&self, name: &str) -> bool {
let stream_path = format!("streams/refs/{}", name);

readlinkat(&self.repository, &stream_path, []).is_ok()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like "swallowing" errors like this, I'd say call stat instead and require it's S_IFLNK

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I redid this with stat()

#[derive(Clone, Debug, FromBytes, Immutable, IntoBytes, KnownLayout)]
#[repr(C)]
pub struct MappingEntry {
pub body: Sha256Digest,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're changing the format here...I think it'd be nice to make this one extensible.

However...bigger picture there's another consideration: There's obviously a metric ton of binary serialization formats out there. A custom one isn't wrong necessarily but...how about say CBOR ? It has some usage and a proper RFC etc.

I guess a dividing line is "do we care about mmap()"? Probably not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

splitstreams are essentially thin wrappers of existing binary formats (tars, ostree objects, etc), adding just references to other composefs repo objects. I'm not sure its overly helpful to use a complicated binary format for the wrapping, especially one which is completely different from the inner format.

That said, I agree that we should make it at least a bit extensible. This MR adds a magic header, but also adding a version field and a few bytes of unused/unparsed space does seem quite useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed mmap, and the end result was, no, we don't want it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

splitstreams are essentially thin wrappers of existing binary formats (tars, ostree objects, etc), adding just references to other composefs repo objects. I'm not sure its overly helpful to use a complicated binary format for the wrapping, especially one which is completely different from the inner format.

Yeah, though the nice thing about Rust here is that for stuff like this there's a lot of well-done crates.

It also makes it a lot more obvious and easy to parse from other languages too if we can say "it's just CBOR" (or whatever).

Anyways: I'm basically fine with this as is too.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although what about the algorithm agility? There's been some thoughts that for post-quantum crypto we may need to get away from sha256 in theory as far as I understand things.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the right thing is to add a header size field, and skip parts we don't understand. Then we can easily extend this later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a new extension_size field which we skip on read.

@alexlarsson alexlarsson force-pushed the splitstream-new-format branch from 9aedd96 to 057121b Compare October 6, 2025 14:47
This changes the splitstream format a bit, with the goal of allowing
splitstreams to support ostree files as well (see containers#144)

The primary differences are:

 * The header is not compressed
 * All referenced fs-verity objects are stored in the header, including
   external chunks, mapped splitstreams and (a new feature) references
   that are not used in chunks.
 * The mapping table is separate from the reference table (and generally
   smaller), and indexes into it.
 * There is a magic value to detect the file format.
 * There is a magic content type to detect the type wrapped in the stream.
 * We store a tag for what ObjectID format is used
 * The total size of the stream is stored in the header.

The ability to reference file objects in the repo even if they are not
part of the splitstream "content" will be useful for the ostree
support to reference file content objects.

This change also allows more efficient GC enumeration, because we
don't have to parse the entire splitstream to find the referenced
objects.

Signed-off-by: Alexander Larsson <[email protected]>
@alexlarsson alexlarsson force-pushed the splitstream-new-format branch from 057121b to bed66dc Compare October 6, 2025 15:19
Copy link
Collaborator

@allisonkarlitskaya allisonkarlitskaya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's talk about this. I might have a bit of bandwidth to work on this if you like.

struct SplitstreamHeader {
magic: [u8; 7], // Contains SPLITSTREAM_MAGIC
algorithm: u8, // The fs-verity algorithm used, 1 == sha256, 2 == sha512
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also do fs-verity block size here. That's usually expressed as a bit-shift count, so 12 or 16...

We could also write it like "fsverity-sha256-12" or so as a string... some relevant discussion in #181.

n_refs: u64,
n_mappings: u64,
refs: [ObjectID; n_refs] // sorted
mappings: [MappingEntry; n_mappings] // sorted by body
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's so sketch that we hardcode sha256 here... I think that's probably OK, but maybe we'd add an extension mechanism so we could add new types of mappings tables...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this from the start:

  • magic
  • n_sections
  • n_sections * (
    • section_start
    • section_size_in_bytes
  • )

We could name the sections but I think it's quite OK to just know what the numbers mean and require that they're all present, in order. An empty section would be denoted by a zero size.

We could also get into compat vs. uncompat extensions... not sure how far it's worth going here...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants